Machine translation using isomorphic UCGs

نویسندگان

  • John L. Beaven
  • Pete Whitelock
چکیده

This paper discusses the application of Unification Categorial Grammar (UCG) to the framework of Isomorphic Grammars for Machine Translation pioneered by Landsbergen. The Isomorphic Grammars approach to MT involves developing the grammars of the Source and Target languages in parallel, in order to ensure that SL and TL expressions which stand in the translation relation have isomorphic derivations. The principle advantage of this approach is that knowledge concerning translation equivalence of expressions may be directly exploited, obviating the need for answers to semantic questions that we do not yet have. Semantic and other information may still be incorporated, but as constraints on the translation relation, not as levels of textual representation. After introducing this approach to MT system design, and the basics of monolingual UCG, we will show how the two can be integrated, and present an example from an implemented bidirectional Engllsh-Spanish fragment. Finally we will present some outstanding problems with the approach. 1 Background and Introduct ion The aim of this paper is to explore how the linguistic theory known as Unification Categorial Grammar can be adapted to the general methodology of Machine Translation using Isomorphic Grammars, as pioneered by Landsbergen and others in the ROSETTA team [Landsbergen 87a, b]. UCG is one of several recent grammar formalisms [Calder et al. 86, Kar t tunen 86, Pollard 85] which are highly lexicalist, i.e. rules of syntactic combination are not a language-specific component of the grammar, but are very general in character, and combinatory information is primarily associated with lexical items. Lexical items are represented by sets of feature-value pairs (where the values may be themselves sets of such pairs}, and are combined by unification into objects of the same type. The language defined is thus the closure of the lexicon under the combinatory rules. Landsbergen's work on Isomorphic Grammars follows Montague's approach of having a one-to-one correspondence between syntactic and semantic rules. A syntactic rule Rs/, in the Source Language corresponds to a syntactic rule RTL in the Target Language if and only if they are both associated with the same semantic operation Rsem. The translation relation is then defined in a precise manner and it can be guaranteed that well-formed expressions in the Source Language are translatable, as there will be an expression in the Target Language that is derived in a corresponding way, and can therefore be considered as a possible translation of it. *Supported by a studentship from the Science and Engineering Research Council. According to Landsbergen, writing isomorphic grammars is a way of being explicit about the "tuning" of SL and TL grarmnars that is essential for reliable MT. The present paper is an a t tempt to adapt this approach to a type-driven mapping between syntax and semantics. 2 I s o m o r p h i c G r a m m a r s We can recognise two basic relations of relevance in translation. namely, "possible translation" (which is symmetric}, and "best translation" given the current context and much extra-linguistic knowledge (which is not symmetric}. We take the task of the lin.~ guistic component of an MT system to be a correct and complete characterisation of the former, and will have nothing further to say about the latter. An important problem that arises in an interlingual translation system is what Landsbergen [Landsbergen 87b] calls the "subset problem". If the analysis component generates a set L of interlin° gum expressions, and the generation component accepts a set L I of them, the only sentences that can be translated are those that correspond to expressions in the intersection L N L ~. If the gram-. mars of the source and target languages are written independently, there is no way of guaranteeing that they map the languages into the same subset. The problem arises because a sufficiently powerful system of" interlingual representation will contain an infinite number of logically equivalent expressions that represent a meaning of a given Source Language expression. Of course, the Source Language grammar will only associate a single one of these with a given SL expression. However, in the absence of specific tuning, this is not guaranteed to be the same one that the Target Language grammar associates with any of the translation equivalents. Therefore, SL and TL grammars must be tuned to each other. This is not a problem specific to interlingual translation: in the transfer approach to MT system design, this tuning is effected by an explicit transfer module. The use of Isomorphic Grammars is another way of being explicit about this, tuning the grammars themselves rather than their inputs /outputs , which offers a greater possibility of bi-directionality than the transfer approach. Landsbergen assumes the existence of compositional grammars for two languages, that is, grammars in which i) basic expressions correspond to semantic primitives and ii) each syntactic rule that builds up a complex linguistic expreaqion from simpler ones is paired with a semantic rule that builds the meaning of the complex expression from the meanings of the simpler ones. The tuning of grammars consists in ensuring that there it~ a basic expression in one grammar corresponding to each basic ex-~ pression in the other, and that for each semantic rule there is a corresponding syntactic rule in each grammar. Two expressions are then considered possible translations of each other if they can be derived from corresponding basic expressions by applying cor~ responding syntactic rules. In other words, they are possible transo lations of each other if they are built from expressions having the same rneaning, by using syntactic rules that perform the same semantic oper,tions. Note the lack of directional specificity in this definition of the "possible translation" relation. / v 8 The ~monohngual) UCG formalis~n Many recent grarmnar formalisms [Shieber 86] represent linguistic objects as t~ts of attribute-.value pairs. Values taken by these attr ibutes may be atomic, variables, or they may thenmelves be sets of attribate-value pairs, so these objects *nay be thought of as Directed Acyclic Graphs (DAGs), in which directed arcs represent feature% and the nodes at the end of these represent values. Such formalisms t~pically support re-entrancy, that is, they provide a mechanism 5)r specifying that object~s at the end of different paths are the same object. Unification Gategorinl Grarimaar is such a formalism, which combines a categorial t reatment of syntax with semantics similar to Kamp 's :Vliscourse Representation [Kamp 81]. Each linguistic expression licensed by the grammar corresponds to what is called a sign. A sigt~ consists of four main entries or features, which are explained below: 1. p h o n o l o g y (orthography in the present cruse)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Isomorphic Forest Pair Translation

This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translatio...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Machine Translation Based On Logically Isomorphic Montague Grammars

Usually two approaches to machine translation are distinguished: the interlingual approach and the transfer approach (cf. Hutchins [i]). In the interlingual approach translation is a two-stage process: from source language to interlingua and from interlingua to target language. In the transfer approach there are three stages: source language analysiS, transfer and target language generation. Th...

متن کامل

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

We propose a generic rule induction framework that is informed by syntax from both sides of a parsed parallel corpus, as sets of structural, boundary and labeling related constraints. Factoring syntax in this manner empowers our framework to work with independent annotations coming from multiple resources and not necessarily a single syntactic structure. We then explore the issue of lexical cov...

متن کامل

Learning Non-Isomorphic Tree Mappings for Machine Translation

Often one may wish to learn a tree-to-tree mapping, training it on unaligned pairs of trees, or on a mixture of trees and strings. Unlike previous statistical formalisms (limited to isomorphic trees), synchronous TSG allows local distortion of the tree topology. We reformulate it to permit dependency trees, and sketch EM/Viterbi algorithms for alignment, training, and decoding.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988